Google AlphaEarth Foundations¶
Google's AlphaEarth Foundations is a geospatial embedding model trained on multiple Earth Observation (EO) datasets. The model was run on annual time series of images, and the resulting embeddings are available as an analysis-ready dataset in Earth Engine. With this dataset, users can build any number of fine-tuning applications or other tasks without running computationally expensive deep learning models.
Understanding Embeddings¶
Embeddings are a way to compress large amounts of information into a smaller set of features that represent meaningful semantics. The AlphaEarth Foundations model uses time series of sensor images, including Sentinel-2, Sentinel-1, and Landsat, and learns to uniquely represent the mutual information between sources and targets with just 64 numbers. The input data stream contains thousands of image bands from various sensors, and the model transforms this high-dimensional input into a low-dimensional representation.
A good mental model for understanding how AlphaEarth Foundations works is a technique called principal component analysis (PCA). PCA also helps reduce data dimensionality for machine learning applications. While PCA is a statistical technique and can compress dozens of input bands into a few principal components, AlphaEarth Foundations is a deep learning model that can take thousands of input dimensions from multisensor time series datasets and learn to create a 64-band representation that uniquely captures the spatial and temporal variability of that pixel.
An embedding field is the continuous matrix or "field" of learned embeddings. Images in the embedding field collections represent spatiotemporal trajectories spanning an entire year and have 64 bands (one for each embedding dimension).
How AlphaEarth Foundations Works¶
AlphaEarth Foundations offers a powerful new lens for understanding our planet, solving two major challenges: data overload and inconsistent information.
First, it combines volumes of information from dozens of different public sources—optical satellite imagery, radar, 3D laser mapping, climate simulations, and more. It weaves all this information together to analyze the world's land and coastal waters in crisp, 10 x 10 meter squares, allowing it to track changes over time with remarkable accuracy.
Second, it makes this data practical to use. The system's key innovation is its ability to create a highly compact summary for each square. These summaries require 16 times less storage space than those produced by other AI systems we tested and dramatically reduce the cost of planetary-scale analysis.
This innovation allows scientists to do something previously impossible: create detailed, consistent maps of our world, on demand. Whether monitoring crop health, tracking deforestation, or spotting new construction, they no longer need to rely on a single satellite passing overhead. Now, they have a new kind of foundation for geospatial data.
To ensure AlphaEarth Foundations was ready for real-world use, we rigorously tested its performance. When compared to traditional methods and other AI mapping systems, AlphaEarth Foundations was consistently the most accurate. It excelled at a wide range of tasks across time periods, including land use identification and surface property estimation. Crucially, it achieved this in scenarios where label data was scarce. On average, AlphaEarth Foundations had a 24% lower error rate than the models we tested, demonstrating its superior learning efficiency.
AlphaEarth Foundations Architecture¶
(A) Block diagram of the general architecture of the network used for video analysis. Preprocessing converts raw observation data through normalization using global statistics, and acquisition timestamps are converted to sinusoidal timecodes. Individual source encoders transform the inputs to the same latent space before inputting them to the bulk of the model. Outputs are summarized using conditional timecodes, or "summary periods," unique to each decoded source and contrastive learning task. 𝜇 refers to the model embedding outputs.
(B) Model outputs are treated as the mean direction of a von Mises-Fisher distribution, and decoding proceeds by sampling this distribution and concatenating it with sensor geometry metadata and a timecode indicating the relative position in the valid period to be decoded. Decoding proceeds for all sources, with losses dependent on the characteristics of each source.
(C) To avoid collapse and improve performance, embeddings are compared to equivalent batch-rotated embeddings using a dot product. The absolute value of this quantity is minimized as a necessary condition for an empirically uniform distribution in 𝑆63.
(D) Block diagram of the bulk model, consisting of simultaneous paths at different resolutions to maintain efficiency and spatial accuracy.
(E) Contrastive learning between the video teacher-student model and the text encoder.
(F) Full 360° view of the 2023 annual embedding field, covering the Earth's land surface, including smaller islands, to within approximately ±8
What can we do with the Satellite Embeddings dataset?¶
Similarity Search: You can choose a point anywhere on Earth—say, on a specific type of agricultural land or forest—and instantly find and map all other locations with similar surface and environmental conditions anywhere in the world.
Change Detection: By comparing embedding vectors for the same pixel from different years, you can easily identify changes and track processes such as urban expansion, wildfire impacts and recovery, and fluctuating reservoir water levels.
Automatic Clustering: Without pre-existing labels, you can use clustering algorithms to automatically group pixels into distinct categories. This spatiotemporal segmentation can reveal hidden patterns in the landscape, differentiating various types of forests, soils, or urban development.
Smarter Classification: You can create accurate maps with much less training data. For example, instead of needing tens of thousands of labeled points to map crop types with more conventional inputs, you might need only a few hundred per class, saving time and computation.
Visualizing Embeddings using GEE and Python¶
Let's import GEE and the geemap library:
import ee
import geemap
You need to authenticate with your own GEE project. Here I'm using mine.
ee.Authenticate()
ee.Initialize(project='my-project-1527255156007')
We select the Dataset Embedding:
dataset = ee.ImageCollection('GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL')
point = ee.Geometry.Point([-121.8036, 39.0372])
Let's get data from 2023 and 2024. We'll also generate a similarity product between the two images:
image1 = dataset \
.filterDate('2023-01-01', '2024-01-01') \
.filterBounds(point) \
.first()
image2 = dataset \
.filterDate('2024-01-01', '2025-01-01') \
.filterBounds(point) \
.first()
vis_params = {'min': -0.3, 'max': 0.3, 'bands': ['A01', 'A16', 'A09']}
dot_prod = image1.multiply(image2).reduce(ee.Reducer.sum())
We can then visualize a colored composition with 3 of the 64 bands:
Map = geemap.Map()
Map.centerObject(point, 12)
Map.set_options('SATELLITE')
Map.addLayer(image1, vis_params, '2023 embeddings')
Map.addLayer(image2, vis_params, '2024 embeddings')
Map.addLayer(dot_prod, {'min': 0, 'max': 1, 'palette': ['white', 'black']},
'Similarity between years (brighter = less similar)')
Map
Map(center=[39.0372, -121.80360000000002], controls=(WidgetControl(options=['position', 'transparent_bg'], wid…
Clustering using Embeddings¶
We'll apply Embeddings to a practical clustering example. We start by selecting an area.
AOI = ee.Geometry.Polygon([[[-51.829383948549825, -20.71028051938351],
[-51.829383948549825, -20.89899140074203],
[-51.477821448549825, -20.89899140074203],
[-51.477821448549825, -20.71028051938351]]])
Now, from the previous dataset we select our image and crop our area.
image_AOI = dataset \
.filterDate('2024-01-01', '2025-01-01') \
.filterBounds(AOI) \
.mosaic().clip(AOI)
We present the image:
Map = geemap.Map()
Map.centerObject(AOI, 12)
Map.set_options('SATELLITE')
Map.addLayer(image_AOI, vis_params, '2024 embeddings')
Map
Map(center=[-20.80470580841122, -51.653602698549776], controls=(WidgetControl(options=['position', 'transparen…
Let's download this image with the help of the geemap library:
!pip install geedim
Collecting geedim Downloading geedim-1.9.1-py3-none-any.whl.metadata (13 kB) Requirement already satisfied: numpy>=1.19 in /usr/local/lib/python3.11/dist-packages (from geedim) (2.0.2) Collecting rasterio>=1.3.8 (from geedim) Downloading rasterio-1.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.1 kB) Requirement already satisfied: click>=8 in /usr/local/lib/python3.11/dist-packages (from geedim) (8.2.1) Requirement already satisfied: tqdm>=4.6 in /usr/local/lib/python3.11/dist-packages (from geedim) (4.67.1) Requirement already satisfied: earthengine-api>=0.1.379 in /usr/local/lib/python3.11/dist-packages (from geedim) (1.5.24) Requirement already satisfied: requests>=2.2 in /usr/local/lib/python3.11/dist-packages (from geedim) (2.32.3) Requirement already satisfied: tabulate>=0.8 in /usr/local/lib/python3.11/dist-packages (from geedim) (0.9.0) Requirement already satisfied: google-cloud-storage in /usr/local/lib/python3.11/dist-packages (from earthengine-api>=0.1.379->geedim) (2.19.0) Requirement already satisfied: google-api-python-client>=1.12.1 in /usr/local/lib/python3.11/dist-packages (from earthengine-api>=0.1.379->geedim) (2.177.0) Requirement already satisfied: google-auth>=1.4.1 in /usr/local/lib/python3.11/dist-packages (from earthengine-api>=0.1.379->geedim) (2.38.0) Requirement already satisfied: google-auth-httplib2>=0.0.3 in /usr/local/lib/python3.11/dist-packages (from earthengine-api>=0.1.379->geedim) (0.2.0) Requirement already satisfied: httplib2<1dev,>=0.9.2 in /usr/local/lib/python3.11/dist-packages (from earthengine-api>=0.1.379->geedim) (0.22.0) Collecting affine (from rasterio>=1.3.8->geedim) Downloading affine-2.4.0-py3-none-any.whl.metadata (4.0 kB) Requirement already satisfied: attrs in /usr/local/lib/python3.11/dist-packages (from rasterio>=1.3.8->geedim) (25.3.0) Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from rasterio>=1.3.8->geedim) (2025.8.3) Collecting cligj>=0.5 (from rasterio>=1.3.8->geedim) Downloading cligj-0.7.2-py3-none-any.whl.metadata (5.0 kB) Collecting click-plugins (from rasterio>=1.3.8->geedim) Downloading click_plugins-1.1.1.2-py2.py3-none-any.whl.metadata (6.5 kB) Requirement already satisfied: pyparsing in /usr/local/lib/python3.11/dist-packages (from rasterio>=1.3.8->geedim) (3.2.3) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests>=2.2->geedim) (3.4.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests>=2.2->geedim) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests>=2.2->geedim) (2.5.0) Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5 in /usr/local/lib/python3.11/dist-packages (from google-api-python-client>=1.12.1->earthengine-api>=0.1.379->geedim) (2.25.1) Requirement already satisfied: uritemplate<5,>=3.0.1 in /usr/local/lib/python3.11/dist-packages (from google-api-python-client>=1.12.1->earthengine-api>=0.1.379->geedim) (4.2.0) Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from google-auth>=1.4.1->earthengine-api>=0.1.379->geedim) (5.5.2) Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from google-auth>=1.4.1->earthengine-api>=0.1.379->geedim) (0.4.2) Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.11/dist-packages (from google-auth>=1.4.1->earthengine-api>=0.1.379->geedim) (4.9.1) Requirement already satisfied: google-cloud-core<3.0dev,>=2.3.0 in /usr/local/lib/python3.11/dist-packages (from google-cloud-storage->earthengine-api>=0.1.379->geedim) (2.4.3) Requirement already satisfied: google-resumable-media>=2.7.2 in /usr/local/lib/python3.11/dist-packages (from google-cloud-storage->earthengine-api>=0.1.379->geedim) (2.7.2) Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /usr/local/lib/python3.11/dist-packages (from google-cloud-storage->earthengine-api>=0.1.379->geedim) (1.7.1) Requirement already satisfied: googleapis-common-protos<2.0.0,>=1.56.2 in /usr/local/lib/python3.11/dist-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client>=1.12.1->earthengine-api>=0.1.379->geedim) (1.70.0) Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<7.0.0,>=3.19.5 in /usr/local/lib/python3.11/dist-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client>=1.12.1->earthengine-api>=0.1.379->geedim) (5.29.5) Requirement already satisfied: proto-plus<2.0.0,>=1.22.3 in /usr/local/lib/python3.11/dist-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client>=1.12.1->earthengine-api>=0.1.379->geedim) (1.26.1) Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/lib/python3.11/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=1.4.1->earthengine-api>=0.1.379->geedim) (0.6.1) Downloading geedim-1.9.1-py3-none-any.whl (74 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.1/74.1 kB 2.8 MB/s eta 0:00:00 Downloading rasterio-1.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (22.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.2/22.2 MB 114.4 MB/s eta 0:00:00 Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB) Downloading affine-2.4.0-py3-none-any.whl (15 kB) Downloading click_plugins-1.1.1.2-py2.py3-none-any.whl (11 kB) Installing collected packages: cligj, click-plugins, affine, rasterio, geedim Successfully installed affine-2.4.0 click-plugins-1.1.1.2 cligj-0.7.2 geedim-1.9.1 rasterio-1.4.3
geemap.download_ee_image(image_AOI, 'image_AOI.tif', scale=10, crs='EPSG:4326', region=AOI)
WARNING:geedim.download:Consider adjusting `region`, `scale` and/or `dtype` to reduce the image_AOI.tif download size (raw: 4.22 GB).
image_AOI.tif: | | 0.00/4.22G (raw) [ 0.0%] in 00:00 (eta: ?)
WARNING:rasterio._env:CPLE_NotSupported in 'True' is an unexpected value for BIGTIFF creation option of type string-select. WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: earthengine.googleapis.com. Connection pool size: 10 WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: earthengine.googleapis.com. Connection pool size: 10 WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: earthengine.googleapis.com. Connection pool size: 10 WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: earthengine.googleapis.com. Connection pool size: 10 WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: earthengine.googleapis.com. Connection pool size: 10 There is no STAC entry for: None WARNING:geedim.stac:There is no STAC entry for: None
We install rasterio and import the image as a numpy array:
!pip install rasterio
Requirement already satisfied: rasterio in /usr/local/lib/python3.11/dist-packages (1.4.3) Requirement already satisfied: affine in /usr/local/lib/python3.11/dist-packages (from rasterio) (2.4.0) Requirement already satisfied: attrs in /usr/local/lib/python3.11/dist-packages (from rasterio) (25.3.0) Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from rasterio) (2025.8.3) Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.11/dist-packages (from rasterio) (8.2.1) Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.11/dist-packages (from rasterio) (0.7.2) Requirement already satisfied: numpy>=1.24 in /usr/local/lib/python3.11/dist-packages (from rasterio) (2.0.2) Requirement already satisfied: click-plugins in /usr/local/lib/python3.11/dist-packages (from rasterio) (1.1.1.2) Requirement already satisfied: pyparsing in /usr/local/lib/python3.11/dist-packages (from rasterio) (3.2.3)
import rasterio
from sklearn.cluster import KMeans
import numpy as np
import geemap
import geopandas as gpd
With our area of interest, we will create 500 random points to extract the embeddings for each point and use Kmeans to cluster these points.
AOI_gdf = geemap.ee_to_gdf(ee.FeatureCollection(AOI))
AOI_gdf
| geometry | |
|---|---|
| 0 | POLYGON ((-51.82938 -20.89899, -51.47782 -20.8... |
Here we generate the random points:
random_points = AOI_gdf.sample_points(size=500)
random_points_gdf = random_points.explode(index_parts=False).reset_index(drop=True)
random_points_gdf
| sampled_points | |
|---|---|
| 0 | POINT (-51.82863 -20.80825) |
| 1 | POINT (-51.82754 -20.75538) |
| 2 | POINT (-51.82728 -20.77413) |
| 3 | POINT (-51.82695 -20.77285) |
| 4 | POINT (-51.8254 -20.72147) |
| ... | ... |
| 495 | POINT (-51.48195 -20.89392) |
| 496 | POINT (-51.48117 -20.77715) |
| 497 | POINT (-51.48102 -20.76665) |
| 498 | POINT (-51.48095 -20.82149) |
| 499 | POINT (-51.48087 -20.77834) |
500 rows × 1 columns
We extract the information from the image for each point:
coord_list = [(x, y) for x, y in zip(random_points_gdf.x, random_points_gdf.y)]
with rasterio.open('/content/image_AOI.tif') as src:
Values = [x for x in src.sample(coord_list)]
Values = np.array(Values)
Then we apply clustering:
n_clusters = 5
kmeans = KMeans(n_clusters=n_clusters)
clusters = kmeans.fit(Values)
labels = clusters.predict(Values)
labels
array([1, 2, 3, 3, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 3, 1, 3, 2, 1, 1, 1, 3,
3, 3, 2, 3, 2, 3, 3, 3, 1, 3, 2, 1, 3, 1, 3, 3, 1, 3, 3, 1, 2, 2,
2, 2, 2, 3, 1, 1, 2, 3, 2, 1, 3, 3, 1, 3, 3, 1, 2, 2, 3, 2, 1, 1,
1, 1, 3, 1, 3, 2, 1, 2, 1, 1, 2, 3, 2, 3, 1, 2, 2, 2, 1, 1, 2, 2,
2, 1, 2, 2, 3, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2,
2, 3, 2, 2, 2, 2, 3, 2, 2, 2, 3, 4, 2, 2, 2, 1, 2, 2, 2, 3, 4, 4,
2, 2, 1, 2, 2, 3, 2, 3, 1, 1, 2, 4, 2, 3, 3, 3, 2, 2, 2, 2, 4, 4,
1, 2, 2, 2, 2, 2, 2, 3, 4, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 1, 2,
2, 2, 4, 2, 2, 4, 3, 3, 4, 2, 2, 2, 2, 4, 2, 2, 2, 2, 3, 2, 3, 2,
4, 4, 3, 4, 2, 4, 3, 4, 1, 3, 2, 2, 2, 2, 0, 3, 3, 2, 2, 4, 4, 2,
4, 2, 4, 3, 0, 3, 0, 3, 2, 3, 3, 3, 2, 3, 0, 2, 3, 3, 3, 2, 2, 0,
0, 2, 0, 3, 2, 0, 0, 2, 0, 0, 3, 3, 2, 0, 0, 2, 0, 0, 3, 0, 0, 0,
4, 0, 0, 0, 1, 0, 0, 3, 0, 3, 0, 0, 0, 3, 3, 3, 3, 0, 3, 2, 0, 2,
2, 3, 0, 0, 3, 1, 3, 0, 0, 0, 0, 2, 0, 3, 0, 2, 3, 3, 3, 0, 3, 3,
0, 2, 3, 0, 0, 0, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2, 2, 2, 2, 2, 3, 2, 3, 2, 2,
3, 2, 2, 3, 2, 2, 2, 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 3, 2, 2, 2, 2, 2, 2,
2, 3, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 4, 2, 3, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2,
2, 2, 3, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 4, 2, 2,
3, 4, 2, 2, 2, 4, 4, 2, 2, 4, 2, 2, 2, 2, 2, 2], dtype=int32)
With the algorithm trained, we will apply clustering to all pixels in the image we downloaded:
with rasterio.open('/content/image_AOI.tif') as src:
img_emb = src.read()
img_emb = img_emb.transpose([1,2,0])
l,c,b = img_emb.shape
We convert the image to a vector of pixels
img_emb_vec = img_emb.reshape(l*c, b)
We remove infinite values:
img_emb_vec[~np.isfinite(img_emb_vec)] = 0
We apply clustering
clustered_image = clusters.predict(img_emb_vec)
Finally, we convert the prediction vector into an image, display it as matplotlib, and save it to a .tif file.
clustered_image = clustered_image.reshape(l, c)
np.unique(clustered_image)
array([0, 1, 2, 3, 4], dtype=int32)
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
# Define a colormap with 5 colors
cmap = ListedColormap(['red', 'green', 'blue', 'yellow', 'purple'])
# Plot the clustered image
plt.figure(figsize=(10, 10))
plt.imshow(clustered_image, cmap=cmap)
plt.title('K-Means Clustering Result')
plt.colorbar(ticks=range(n_clusters))
plt.show()
with rasterio.open('/content/image_AOI.tif') as src:
profile = src.profile
profile.update(
dtype=rasterio.uint8,
count=1,
nodata= None,
compress='lzw')
with rasterio.open('clustered_image.tif', 'w', **profile) as dst:
dst.write(clustered_image.astype(rasterio.uint8), 1)